基于分块重要性模型与Xpath的Web信息抽取的研究

doi:10.3969/j.issn.1006-2475.2009.08.020

计算机与现代化 ›› 2009, Vol. 8 ›› Issue (8): 73-75,7.doi: 10.3969/j.issn.1006-2475.2009.08.020

基于分块重要性模型与Xpath的Web信息抽取的研究

庞秋奔，顾平，杨小梅

广西大学计算机电子信息学院,广西南宁 530004

收稿日期:2008-08-29 修回日期:1900-01-01 出版日期:2009-08-21 发布日期:2009-08-21

Research on Web Information Extraction Based on Combining Block Importance Model and Xpath

PANG Qiu-ben,GU Ping,YANG Xiao-mei

School of Computer, Electronics and Information, Guangxi University, Nanning 530004, China

Received:2008-08-29 Revised:1900-01-01 Online:2009-08-21 Published:2009-08-21

摘要/Abstract

摘要： 网页分块方法使得Web信息抽取的单位由页面缩小为块。文中研究了网页分块的主要方法与基于学习的分块重要性模型，对Xpath的Web抽取方法进行分析。结合两者的优势提出一种基于分块重要性模型与Xpath结合的Web信息抽取方法，探讨了其设计过程，并给出形式化描述与实验结果，结果表明该方法适合于抽取多记录型的网页。

关键词: 网页分块, 块重要性权重, Xpath, Web信息抽取

Abstract: Approaches of page segment reduce the unit of Web information extraction from page to block. This paper studies the main approaches of page segment and the basedlearning block importance model, and analyses the approach of Xpathbased Web information extraction. Combining the advantages of the two approaches, this paper proposes a new Web information extraction based on combining block importance model and Xpath, discusses its design process, and gives its formalized description and experimental result. The result shows that this approach is fit for extracting from the Web which has many records.

Key words: page segment, value of block importance, Xpath, Web information extraction

中图分类号:

TP391.1

庞秋奔;顾平;杨小梅. 基于分块重要性模型与Xpath的Web信息抽取的研究[J]. 计算机与现代化, 2009, 8(8): 73-75,7.

PANG Qiu-ben;GU Ping;YANG Xiao-mei. Research on Web Information Extraction Based on Combining Block Importance Model and Xpath[J]. Computer and Modernization, 2009, 8(8): 73-75,7.

基于分块重要性模型与Xpath的Web信息抽取的研究

Research on Web Information Extraction Based on Combining Block Importance Model and Xpath

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 1

编辑推荐

Metrics

本文评价